Corpus Augmentation for Improving Neural Machine Translation

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving Chinese Grammatical Error Correction with Corpus Augmentation and Hierarchical Phrase-based Statistical Machine Translation

In this study, we describe our system submitted to the 2nd Workshop on Natural Language Processing Techniques for Educational Applications (NLP-TEA-2) shared task on Chinese grammatical error diagnosis (CGED). We use a statistical machine translation method already applied to several similar tasks (Brockett et al., 2006; Chiu et al., 2013; Zhao et al., 2014). In this research, we examine corpus...

متن کامل

Data Augmentation for Low-Resource Neural Machine Translation

The quality of a Neural Machine Translation system depends substantially on the availability of sizable parallel corpora. For low-resource language pairs this is not the case, resulting in poor translation quality. Inspired by work in computer vision, we propose a novel data augmentation approach that targets low-frequency words by generating new sentence pairs containing rare words in new, syn...

متن کامل

Improving a Multi-Source Neural Machine Translation Model with Corpus Extension for Low-Resource Languages

In machine translation, we often try to collect resources to improve performance. However, most of the language pairs, such as Korean-Arabic and Korean-Vietnamese, do not have enough resources to train machine translation systems. In this paper, we propose the use of synthetic methods for extending a low-resource corpus and apply it to a multi-source neural machine translation model. We showed ...

متن کامل

Improving Lexical Choice in Neural Machine Translation

We explore two solutions to the problem of mistranslating rare words in neural machine translation. First, we argue that the standard output layer, which computes the inner product of a vector representing the context with all possible output word embeddings, rewards frequent words disproportionately, and we propose to fix the norms of both vectors to a constant value. Second, we integrate a si...

متن کامل

Improving Low-Resource Neural Machine Translation with Filtered Pseudo-Parallel Corpus

Large-scale parallel corpora are indispensable to train highly accurate machine translators. However, manually constructed large-scale parallel corpora are not freely available in many language pairs. In previous studies, training data have been expanded using a pseudoparallel corpus obtained using machine translation of the monolingual corpus in the target language. However, in lowresource lan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Computers, Materials & Continua

سال: 2020

ISSN: 1546-2226

DOI: 10.32604/cmc.2020.010265